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31 14) and eyes (3104-6,31 15-31 18). Only the H and S 
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components are used for detecting the facial area; and 
only the S and V components for the mouth within the 
fecial area. A face vector may be generated (50) using 
the tracking signals. 
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Description 

The invention relates to an image processing method and apparatus and more particularly to reception of a subject 
fecial Input image signal and generation of a feature extraction tracking signal representing facial features of the sub- 
5 ject. 

In the paper "Realtime Facial Image Recognition In Unconstrained Environment for Interactive Visual Interface** by 
Hasegawa et al published In ACCV '93 Asian Conference on Computer vision. November 23-25. Osaka. Japan pp. 763- 
766 a system is outiined In whidi features such as the eyes and the nose are extracted as edges. The tracking signal 
may be processed for Integration of facial features or for monitoring eye contact. Because feature extraction involves 
10 monitoring feature edges, the amount of useful Information would appear to be limited. Little technical information Is 
given in the paper to describe how the system operates. However, it is mentioned tfiat RGB colour information is calcu- 
lated. The use of RGB Information generally leads to high complexity In the Image processing circuits. 

The paper by J.F.S. Yau and N.D. Duffy entitled "A Feature Tracking Mettiod for Motion Parameter Estimation in a 
Model-Based Coding Application" presented at tiie Third International Conference on Image Processing and its Appli- 
es cations held at Warick on 18-20th July 1989 and published in lEE Conference Publication No. 307 at pages 531 to 535 
describes a mettiod of tracking a face. In this mettiod. there is a first phase which involves tracking ttie eye. nose and 
mouth ova* ttie image sequence. This is achieved by locating ttie facial features witiiln the first frame and then tracking 
them over subsequent frames using block searching and code-book techniques. The result of ttie first ti-acking phase is 
a description of the trajectory of facial feature boxes over the image sequence along ttie temporal axis. There is then a 
20 second phase which involves motion parameter estimation whereby the spatial distribution of the facial feature boxes 
for each frame are interpreted to provide an estimate of position and orientation. In tills way. the dynamics of facial 
movement are parameterlsed for application in a three-dimensional model-based Image coding scheme. 

The output signal represents a facial image having feature extraction information. It appears that this prior method 
Is intolerant to occlusion as once tracking of a feature is lost it has difficulty re-locating it. Further, processing must of 
2S necessity be complex as tiie inten-elationship of boxes is analysed on a frame-by-frame basis. 

Japanese Patent Specification No. JP 02141880 (Graphic Communication Technologies) describes a system 
whereby an Image signal Is divided Into a grid of regions and there is analysis of each region separately. The evaluation 
is performed on a single Image and does not Involve processing from frame to frame and the purpose of the system is 
to discriminate a face In an image. 
30 Japanese Patent Specification No. JP 63142986 (NEC) describes a system which detects the facial area of an 
image according to detection of mouth movement. A suggested application for the system is that of obtaining tfie image 
of a face and overlaying it upon a picture of clean clothing. Accordingly, tiiere is limited feature extraction In tiiese sys- 
tems, and also therefore little versatility. 

In general, it could be said ttiat tiie prior art shows limited feature extraction. 
35 The invention is directed towards providing a mettiod and apparatus for feature extraction which Involves less com- 
plexity ttian heretot)efore. 

Another object is to provide a tracking signal which is of more benefit than heretobefore for down-stream process- 
ing with a wider range of applications. 
The invention is characterised in that :- 

40 

the input image signal is In H.S. V format; 

a facial area location signal is generated by passing at least part of tiie input Image signal ttirough a band pass filter 
and analysing the output of the filter; 

45 

a mouth focation signal is generated by passing at least part of the input image signal through a band pass filter 
and analysing the output of he filter witfiin the facial pixel area according to the facial area location signal; 

eye focation signals are generated by processing at least part of tiie input image signal within the facial pixel area 
so according to the facial area location signed; and 

the facial area location, mouth location and eye location signals are outputted as output tracking signals. 

In one emtxxjiment. only two of the H.S.V input image components are used for generation of the facial area loca- 
55 tion signal. 

Preferably, the H and S conrponents are passed ttirough the band pass filter for generation of the fadal area loca- 
tion signal. 

In one embodiment, only two of ttie H,S.V input image components are used for generation of the mouth location 
signal. 
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Preferably, the S and V components are passed through the band pass filter for generation of the mouth area loca- 
tion signal. 

In one embodiment, the band pass filter oulput signals are analysed by mapping the output data over the pixel area 
and generating a projection in a mapping axis and analysing said projection, and preferably two projections are gener- 
5 ated, one for each axis in a two<IImensionai pixel area plane. 

In another embodiment, each band pass filter comprises a look-up table containing filter Indicators which are gen- 
erated off-line. 

Ideally, the step of analysing the filter output signals comprises the further steps of determining maximum limits in 
the pixel area for a feature and generating a bounding box according to said limits. 
10 In a further embodiment, the image processing for generation of the eye area location signals conprises the steps 
of con-elation with template, and preferably the image signal is normalised before correlation. 

In another embodiment, the V conponent only of the input image signal is used for generation of the eye location 
signals. 

In a further embodiment, the tracking signals which are generated are post-processed to generate a facial charac- 
15 teristic signal representing both location and positional characteristic data, said signal being generated by passing the 
tracking signals through logic devices. 

According to anottier aspect the invention provides an image processing apparatus comprising :- 

means for receiving an input image signal in H,S,V format; 

20 

a facial area band pass filter; 

means for passing at least part of the input image signal through the facial area band pass filter and analysing the 
output of the filter to generate a facial area location signal; 

a mouth location band pass f ilt^; 

means for passing at least part of the input image signal through the mouth location band pass filter and for ana- 
lysing the output of the filter within the face pixel area according to the facial area location signal; 

30 

processing means for processing at least part of tiie input image signal within the fadal pixel area according to the 
fecial area location signal to generate eye location signals; and 

means for oulputting said fecial area location, mouth location, and eye location signals as output tracking signals. 

35 

Preferably only the H and S components of tiie input image signal are passed through the facial area band pass 
filter. 

In one emtxxliment only the S and V components of the input image signal are passed through the mouth location 
band pass f Oter. 

40 In a further embodiment, the apparatus further comprises post-processing logic devices comprising means for 
receiving tiie tracking signals and generating a fecial characteristic signal representing both location and positional 
characteristic data. 

The invention will be more clearly understood from the following description of some embodiments thereof, given 
by way of example ortiy with reference to the acoonpanylng drawings, in which :- 

45 

Fig. 1 is an overview diagram showing an image processing apparatus for generation of a feature extraction track- 
ing signal; 

Figs. 2(a) to 2(p) are detailed diagrams showing various devices making up tiie apparatus of Rg. 1 in more detail; 
so and 

Rgs. 3(a) and 3(b) are detailed dfegrams showing construction of a post-processor of the apparatus of Rg. 1 . 

Referring to the drawings, there is shown an image processing apparatus comprising a facial part detection unit 31 
55 and a post-processor 50. The fecial part detection unit 31 takes an input video signal from a camera, captures images 
and detects the fedal parts such as mouth, eyes, eye pupils, eyebrows etc. by colour region monitoring and determines 
their positions within that image. The output of the fecial part detection unit 31 is a feature extraction tracking signal hav- 
ing a set of positional parameters for tiie facial parts. These parameters are :- 
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Mix. MIy. Mhx, Mh/ : 
Six. Sly. Shx, Shy : 



Specify the Mouth Box 
Specify the Face Box 
Specify the Left Eye Position 
Specify the Right Eye Position 
Specify the Right Pupil Position 
Specify the Right Eyebrow Position 
Specify the Left Pupil Position 
Specify the Left Eyebrow Position 



LEx. LEy : 
REx. REy : 
RPx, RPy : 
RBx. RBy : 
LPx. LPy : 
LBx. LBy : 



The manner in which these parameters are generated is described in more detail below. It will be appreciated that 
such tracking signals are very comprehensive and would be of benefit in a wide range of applications such as sign lan- 
guage communication and data capture. 

The post processor 50 uses the positional parameters to generate a set of facial characteristics so that the facial 
feature position and orientation can be expressed. These characteristics are :- 

Mouth Openness in X Orientation 
Mouth Openness in Y Orientation 
Face Rotation in X Orientation 
Face Rotation in Y Orientation 
Face Rotation In Z Orientation 
Eye Direction in Horizontal Position 
Eye Direction in Vertical Position 
Eyebrow Vertical Position 

It would be possible to produce other information such as distance by suitably processing tiie tracking signal. Con- 
struction of the post-processor 50 is shown in Figs. 3(a) and 3(b) whereby various average, subtracter, subtract and 
divide, and multiply circuits are used. 

Referring now to Rg. 1 , the facial part detection unit 31 is now described in more detail. The function of the unit 31 
is to provide a feature extraction tracking signal having feature extraction parameter values which are used by the post 
processor 50 to generate a face vector. The tracking signal may alternatively be independently outputted. 

The facial part detection unit 31 provides five different feature extraction tracking signals, namely. A, B. C, D and E. 

The A output represents the mouth coordinates and is provided by the following devices > 

a band pass device 31 02; 

a crop picture device 3110; 

a smoothing device 3111; 

an XY projection device 31 12; 

a find max device 31 13; and 

a bounding box search device 31 14. 

The output B specifies the face box and is provided by the following devices:- 

a band pass device 3103; 

an XY projection device 3107; 

a find max device 31 08; and 

a bounding box search device 3109. 

Crop picture, find max/min and normalise grey scale devices 3104, 3105 and 3106 respectively provide pre- 
processing normalisation for the remaining outputs, G,D, and E. The output 0 represents the left and right eye positions 
and is provided by a device 31 15 which correlates with an eye template, and a find min device 31 16. 
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The output D spedfies the right pupil and eyebrow positions and is provided by a right eye detection device 3117. 
Finally, the output E represents the left pupil and eyebrow positions and is generated by a left eye detection device 
3118. 

The converter 3101 takes as input a video signal (R. G. B. Composite, etc.) and outputs digital values of the colour 

5 represented in the HSV colour domain. The output is passed to the band pass devices 3102 and 3103 and the pre- 
processing normalisation devices 3104 to 3106. The band pass device 3102 detects mouth colour and the band pass 
device 3103 detects skin colour. The skin colour detection signal passes to the face position detection devices 3107. 
3108 and 3109 which produce a t^x which gives the position of tiie face in the image. The facial box coordinates are 
passed to mouth position detection devices 3110-3114 which search the facial box region to determine the position of 

10 the moutii in the Image, tt is of course tnrplied that the mouth position is to be found within the facied box. 

The pre-processing-normalisation devices 3104 to 31 06 normalise the pixels in tiie fecial box before outputting tiiis 
image to eye position devices 3115 and 3116 and the pupil and eyebrow position detection devices 3117 and 3118. The 
purpose of this is to increase the accuracy of the correlation results In the eye position detection. The eye position 
detection devices 3115 and 3116 conrelate the facial area of the normalised image with pre-stored eye templates to 

IS determine the location of the eyes, and produce two X.Y coordinates which specify the eye locations within the image. 
These eye position coordinates are passed to the pupil and eyebrow position detection devices 3117 and 31 1 8 which 
use these coordinates to obtain areas around each eye which are then post-processed to obtain the pupil and eyebrow 
positions for each eye. An important aspect of operation of the unit 31 is operation of the band pass devices to filter the 
HV data and only pass through data which is shown to be present in a colour template of the skin and face. 

20 Referring now to the various diagrams of Fig. 2. the devices 3101 to 31 18 are described in more detail. 

As shown in Rg. 2(a). the colour conversion device 31 01 comprises an ADC, and a look-up table for each of the R. 
G and B components, all connected to an RGB to HSV look-up table There are several different implementations of this 
conversion which will be known to those people skilled in the art 

The input stage has an arrangement in which tiie S and V components are directed to the mouth detection filter 

25 3102 and ttie H and S components are directed to the face detection filter 3103. Thus, each of these series of circuits 
must only process two components and may therefore be quite simple. It has been fburxi that a combination of H (hue) 
which is essentially wavelength data togetiier with S (saturation) is particularly effective for detecting skin. It has also 
been found that the S and V (value) components are particularly effective for distinguishing the mouth area within tiie 
already identified facial (skin) area. 

30 The purpose of the band pass device 3102 is to filter the S,V data and only pass through data which has been 
shown to be present in a colour template of the mouth. The device 31 02 is shown in circuit form in Fig. 2(b) and is Imple- 
mented as a look-up table (LUT). This may be an SRAM which is programmed off-line, or alternatively a PROM which 
is programmed in production. It has tiie same construction as the fitter 3103 for facial area detection. 

The feature of receiving two of the H. S. and V components for each of the mouth and face series of processing 

35 circuits is inrportant. This avoids ttie need for the very large memories which are commonly required for tiie prior R. G. 
B systems and. further, avoids the need for backprojection in colour histogram matching techniques. Instead, band pass 
filters 3102 and 3103 are used. The two components (S.V for mouth area. HS for face) form an address for the look-up 
table, which stores a value for each address. The table values are generated off-line according to reference mouth and 
face patterns. At its simplest, the values may be at the bit-level giving YES or NO Indications for tiie particular S.V or 

<o H.S combinations. The XY projection devices 31 12 and 3107 perform the next fundamental processing steps by map- 
ping the retrieved table values over the pixel area and generating XY projections. Once this has been done, the next 
steps of finding maximum limits and searching for the bounding box can be easily implemented. 

To put it simply, the ksand pass filtering and the XY projection over the pixel area are tiie fundamental steps and can 
be Implemented by simple circuits. Further, the down-stream steps are very simple to implement 

45 The purpose of the aop picture device 31 10 is to limit the Image processing tasks to only the area determined by 
the face position detection section as it receives the facial area information from the device 31 09. There are two reasons 
for doing this. Firstiy, as only a fraction of the image is processed this increases the number of frames which can be 
processed in a given time period. Secondly, it allows local operations such as normalization to be done on the facial 
area alone, unaffected by extemal influences such as bright light sources in other parts of the picture and random back- 
so ground noise. This increases accuracy in such tasks as eye-tracking. 

The purpose of the smoothing device 3111 shown in Rg. 2(d) is to aid mouth position detection in proceeding 
image processing by the devices 31 12-31 14. It will be noted that the face position detection stage (3107-3109) and the 
mouth position detection stage (3110-31 14) share several common tasks, namely XY Projection, find max and search 
for bounding box. However, the mouth position detection stage includes two extra tasks which are not shown in the face 

55 position detection, namely crop picture and smoothing. The purpose of the crop picture device 3 1 1 0 is explained above. 
The reason that smoothing Is not present in the face position detection derives from the face that the task being under- 
taken is facial parts identification and position location. This implies that the face will occupy a large area In the input 
image. In any image processing task there is a level of background noise due to a variety of factors, for example, inac- 
curades in tiie conversion of analogue data to digital data, stroboscopic effects from foreign lighting sources, light def la- 
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tfon from glasses, etc.. These add noise to the processed image. In the detection of the facial area, since the sidn will 
cover a large percentage of the input image, there is a considerable number of pixels which will be identified as belong- 
ing to the skin. Therefore, the background noise will have little or no effect on the results obtained from the face position 
detection. However, the mouth occupies a much smaller area, and therefore the background noise will have a much 

5 greater effect on obtaining conect results from the mouth position detection stage. The probability that a mouth pixel is 
mistaken for a skin pixel and vice-versa is high and affects mouth area detection. However, in the case of fece position 
detection the fact that a mouth pixel Is mistaken as a skin pixel actually helps in the location of the facial area, since the 
mouth area lies within the face area. However, the opposite applies for the mouth position detection. To help overcome 
this problem the image is smoothed before performing further image processing steps. It is assumed that the back- 

10 ground noise is random in nature and will occur randomly over the image, whereas the mouth pixel recognition is highly 
concentrated in a single area. By averaging over an area, ttie effects of the background noise an be reduced whilst 
enhancing the areas where recognition is highly concentrated. The principle behind the device 3111 is to average all 
pixels within an 8 x 8 area, and place the result at the central pixel point The operation of this circuit and its underlying 
principles will be understood to those skilled In the art The resulting image is a smoothed representation of the input 

15 image. 

Further image processing tasks are performed on the smoothed image by the devices 3112-3114. namely XY Pro- 
jection, find max and seardi for t>ounding box. These devices function in the same manner as the devices 3107 to 3109 
In the face detection stage which are described in detail below. The output from the device 3114 is a signal indicating a 
box which defines an area in the input image where the mouth is located. 

20 The purpose of the device 3107 shown in Fig. 2(e) is to perform an XY projection on the image which is outputted 
from the device 3103 to effectively map the filter output over the pixel area. The device 3107 can be divided into two 
sections which operate in the same fashion, the left hand side which evaluates the X projected data, and the right hand 
side which evaluates the Y projected data. The circuit comprises a 258 x 1 6 bit SRAM which is used to store the X pro- 
jected data, a multiplexer to arbitrate access to the databus of the SRAM, a multiplexer to arbitrate access to ttie 

25 address bus of the SRAM, an adder to perform additions on the projected data. arKi a register to act as an intermediate 
data store. The circuit functions in the following manner. It is assumed tiiat the SRAM can have all bits set to zero, i.e. 
the SRAM can be cleared, at tiie beginning of every XY projection, however, this function is not shown in the diagram. 
H is also assumed tiiat tiie maximum image size is 256x256 pixels, however, to those skilled in the art. it is possible to 
adapt tiie circuits to handle large images. Pixel data is inputted into tiie circuit through l/P pixel data, with the address 

30 of each pixel being inputted tiirough Row Addr and Column Addr. It is assumed that the Select line is set so ttiat Row 
Addr signals impinge upon the SRAM, and that tiie bi-directional buffer is configured to allow data to be read from tiie 
SRAM into tiie ADDER. The Row Addr reads tiie present X projection value from the SRAM into tiie ADDER drcuit. 
The ADDER adds togettier the data from the SRAM and tiiat of the l/P PIXEL DATA and puts the result into the REG- 
ISTER. The bi-directional buffer is then configured to write data from tiie REGISTER Into the SRAM so that the new 

35 result is stored. The next pixel value is tiien inputted into the circuit witii tiie new Row Addr signal being used to select 
the appropriate X storage location. The process is repeated until all pixels in the image have been processed. By 
changing the select switch to allow the External Row Addr to impinge upon the SRAM it is possible to read out the final 
X projection values. The operation of tiie Y projection is can-ied out in parallel to the X projection. 

The purpose of tiie device 3108 shown in Rg. 2(f) is to find tiie maximum value in the X and Y projection data so 

40 that an X and Y location which lies within the facial area can be found. The device 31 08 can be divided into two sections, 
which both process in parallel and operate in tiie same fashion. The basic prindple of the circuit is that each final pro- 
jection value is compared using a comparator CMP witii a maximum value stored in REGISTER A. If tiie projection data 
value is greater than tiie value in REGISTER A tiien ttiis new value is stored in REGISTER A. whilst simultaneously tiie 
column address is stored in REGISTER B. Data from the XY projection device 3107 is read out serially, and impinges 

4S upon REGISTER A and CMP. whilst the address is of tiie Projection value impinge upon REGISTER B. The output of 
REGISTER A is also outputted to CMP where tiie contents of REGISTER A and the Projection X value are compared. 
If tiie result indicates that the projected value is greater tiian tiie contents of REGISTER A then a signal is generated in 
conjunction witii the PIXEL CLK which loads tiie new data value into REGISTER A. whilst simultaneously loading tiie 
address of tiie pixel into REGISTER B. This process is repeated for X(Y) projected values. The values remaining in 

so REGISTER A and B indicate the maximum projected value and tiie location at which it occurred. 

The purpose of tiie device 3109 shown in Rg. 2(g) is to determine tiie limits for a bounding box which will enclose 
the skin area. This area will be used in subsequent image processing tasks. The circuit can be divided Into two identical 
sections, the left hand side relating to finding the boundaries, the right hand side to the Y boundaries. This drcuit uses 
information from the device 3108. namely MAX X POSN. MAX X. MAX Y POSN and MAX Y. The operation of the circuit 

55 is to derive a threshdd value. X""^. for the X data using MAX X. and a tiireshold value. Y**^, for tiie Y data using MAX 
Y This is achieved by multiplying tiie MAX X{Y) by a constant which is less than one. The constant multipliers may be 
different for the X and Y data. The next stage is to determine tiie lower boundary. By starting at the MAX X POSN and 
repeatedly decrementing Its position whilst checking if the X Projection data at this new location is less than the thresh- 
old value. X"^. the point at which tiie X projected data goes beneath tiie threshold can be found. This is tiie X LOWER 
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BOUND. By starting at the MAX X POSN and repeatedly incrementing its position whilst checking if the X Projection 
data at this new location is less than the threshold value, X^, the point at which the X projected data goes beneath the 
threshold can be found. This is the X UPPER BOUND. The calculation of the Y boundaries follows a similar feshion. 
The circuit operation is as follows. The MAX X data from the device 3108 is multiplied by CONST in MULT and the 

5 result which is X^ is passed to CMP. where all data from l/P PROJECTION X DATA will be compared to X^^. The 
value MAX X POSN, also from the device 3108, is loaded in counter using the LOAD signal, originating from device 50. 
The device 50 also provides control signals RST1 and RST2 which reset the RSA and RSB flip flops into the RESET 
state. This will provide the address to look up the final X Projection values In device 3107. The multiplexer in the device 
3107 is set so that the address from the address from the External Row Addr impinges upon the SRAM. In this way. the 

10 X Projection data values can be read from the SRAM and into the divide 3109. Data from the device 3107 arrives at l/P 
PROJECTION X DATA where it Is compared against the X"**" value. If the result of the comparator, CMP, shows that the 
l/P PROJECTION X DATA is less than X''^. then a signal is generated which causes the RS flip flops RSA and RSB to 
be put in the SET position. The address in the COUNTER is decremented until the comparator, CMP, shows that the 
threshold has been exceeded, at which point both flip flop RSA are placed in the SET state. The signal from the flip flops 

IS Is used to load REGISTER A with the cunrent COUNTER value which indicates the X LOWER BOUND. The COUNTER 
is then loaded with the MAX X POSN once again using the LOAD signal. This time, instead of decrementing the COUN- 
TER, the COUNTER is Incremented until the data once again exceeds the threshold value. X™. This time the RSB flip 
flop is placed in the SET state and the output of RSB flip flop is used to load REGISTER B with the value of COUNTER, 
which this time indicates the X UPPER BOUND. The operation for the Y projected values is the same. At the end of this 

20 process, the flip flops RSA and RSB are reset using the control signals RST1 and RST2 from a control logic unit, not 
shown, and the process is repeated for the next frame. 

At this stage the bounding box for the ^cial area has been discovered and a preliminary check can be carried out. 
If the area of the box is found to be extremely small, of the order of less than 20 pixels, then it can be assumed that there 
is no face within the image and the proceeding inage processing tasks of finding the mouth and eyes can be aban- 

25 doned. 

The pre-processing normalisation section uses the devices 3104-31 06. The purpose of the pre-processing normal- 
isation section is to normalise the image before performing correlation to increase the accuracy of the results. This sec- 
tion does not perform image processing on colour information, but on a grey scale image. The V video signal of the HSV 
video standard is a grey scale representation of the input image. 

30 The purpose of the crop device 3 1 04 shown in Fig. 2(c) is to limit the image processing tasks to only the area deter- 
mined by tiie lace position detection section and not tiie whole image. The reasons for doing this are explained above. 

Thef Ind max/min device 31 05 is shown in circuit form in Fig. 2(h). The purpose of this device is to find the maximum 
and minimum pixel values within the image. This information is to be used in the proceeding image processing stage. 
The device 3105 is comprised of two registers, REGISTER A and REGISTER B. and two comparators, CMP A and 

35 CMP B. REGISTER A and CMP A are used to find tiie M/0( VALUE, whereas REGISTER B and CMP B are used to 
find the MIN VALUE. The pixel data from tiie input image is inputted serially via the PIXEL DATA input The data 
impinges upon both registers and both comparators. REGISTER A is used as a temporary storage area for the maxi- 
mum value, MAX VALUE, whereas REGISTER B is used as a temporary storage for the minimum value, MIN VALUE. 
At the beginning of eac^ frame REGISTER A must be set to 0 and REGISTER B must be set to 255 by a control unit 

40 which transmits a control signal CLR. The output of REGISTER A is input to the CMP A where it is compared to the 
input data. If tiie result from tiie comparator CMP A shows that the input data is greater tiian the data stored in REGIS- 
TER A then the comparator generators a LOAD signal which loads the input pixel data into REGISTER A. The operation 
for minimum value uses the same prindple with the comparator generating a load signal when the result from the com- 
parator CMP B shows that tiie input data is less than the data stored in REGISTER B. After all pixels in the input image 

45 have been processed through the circuit the maximum value resides in MAX VALUE, and the minimum value resides 
in MIN VALUE. Before the next input image can be processed both registers must be initialised to their respective val- 
ues. 

The nonmalise grey scale device 3106 is shown in circuit form in Fig. 2(i). The purpose of this stage is to translate 
the irput image in such as way that it uses the full range of possible values, namely 0 to 255. The device 3105 proc- 

50 essed the image and found the maximum values. In an 8 bit grey scale representation the minimum value possible is 
0, and the maximum value possible is 255. However, results from the device 31 05 will indicate that from frame to frame 
the maximum and minimum values found will not be the maximum and minimum possible. Therefore, it is advantageous 
that a method be devised that changes the input image so that it fits the full range of values. The simplest method, as 
shown in Fig. 2(i) is that of a look-up table, 31068, which for an 8 bit input and 8 bit output requires a 256x8 bit memory. 

55 This look-up table must be programmed frame by frame, as the maximum and minimum values will change frame by 
frame. The algorithm for programming the look-up table is as follows > 

255 s X < Max Coeff(x) » 255 
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Max s X s Min Coeff(x) « (int) (255 ^ (x - Min)/(Max - Min) 
Min < X s 0 Coeff(x) o o 

5 where the values Max and Min refer to the values of MAX VALUE and MIN VALUE calculated by the device 31 05. Max 
and Min must be between the values of 0 and 255. and Max > Min. Rg. 2(i) shows that the circuit of the device 3106 is 
made up of devices 31061 -31 069. The circuit has two modes of operation. The first where the coefficients of the look- 
up table are calculated and the other where these coefficients are to convert the irput image into the normalized output 
image. The parts 31061 - 31067 are involved with the calculation of the coefficients which are stored in the SRAM 

10 31068. The data is transformed by allowing the PIXEL DATA to Impinge upon the SRAM as the address by the control 
unit setting the SELECT control signal to the correct state. At the start of each frame ad locations in the LUT are set to 
zero and the MIN VALUE is loaded into a counter, indicated as the part 31061 . The MIN VALUE along with MAX VALUE 
are obtained from the device 3105. The counter is loaded using a LOAD control signal from the control logic unit 50. not 
shown. The output of the counter is inputted to a comparator CMP. which compares the counter value with the MAX 

;5 VALUE. If the value of the counter is greater than MAX VALUE then this indicates that all coefficients have been loaded 
Into look-up table and that the normalization process can start. The comparator CMP outputs a control signal named 
FINISHED. The coefficient calculation can be divided into three steps. In the first step two calculations occur in parallel, 
namely. 

20 (a) MIN VALUE - x where x is the present counter value 

(b) MAX VALUE - MIN VALUE 
then, 

(c) CONST X (Result of 1) using part 31066 
then. 

30 (d) (Result of 3}/(Result of 2) using part 31067 

The value of CONST is set to 255. The SELECT switch of the multiplexer MUX is set to allow the output of the coun- 
ter to impinge upon the address bus of the SRAM. By setting the R/W line to the SRAM to write the results from the 
division part 31 067 are written into the SRAM at the location specified by the counter 31 061 . The counter is then incre- 

35 merited and the process repeated until the comparator CMP. indicates that all coefficients have been calculated and 
stored in the SRAM. At this point the look-up table to normalise the input image. The SELECT signal is switched to allow 
the PIXEL DATA to impinge upon the address bus of the SRAM and the R/W control signal is switched to read. The input 
image is then presented to the SRAM where it is transformed by the look-up table and outputted to the NORMALIZED 
PIXEL DATA stage. When all pixels have been transformed the counter is again loaded with MIN VALUE, all LUT loca- 

40 tions are set to zero, and the process is repeated. 

The output of the pre-processing-normalisation section is passed to two further sections, namely the eye position 
detection stage which finds the eye positions, and the pupil and eyebrow position detection stage which finds the eye 
pupil and eyebrow positions. 

The eye position detection stage has two devices, the correlate with eye template device 31 15 and the find max 
45 device 3116. The eye position detection is processed twice, once with templates for the left eye, and once with the tem- 
plates for the right eye. 

The correlate with eye template device 31 15 is shown in circuit form in Fig. 2(i). The purpose of the device 31 15 is 
to correlate the input image against several pre-stored templates of right and left eye images. The result closest to zero 
from the right eye corrdation indicates the position of the right eye. and the result closest to zero from the left eye cor- 
so relation indicates the position of the left eye. The correlator circuit comprises a circuit which implements the following 
mathematical function in integer arithmetic, using only 4 bits of accuracy. 

1-7 J-7 

P(x*y)^ Z S T(U)f Eqn. (1) 



where P is the Input image. T is the template image, x and y are positional indicators witiiin tiie input image. This equa- 
tion is computed for each pixel in the output image. 
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The algorithm computes the square of all differences between pixels in the input image and the template image. In 
the case that the input image and the template image are identical the result is zero, the accuracy can be improved by 
adding more bits, however, this leads to a more complex hardware implementation. 

Equation (1) can be simplified so show the basic image processing steps which are needed to implement this equa- 

5 tion. 

_ i-7 J-7 

P(^.y) » Z Z (Pf^-^'^y^J)^ + '^O'D^ - Pix-^uy-^iFOj)) Eqn. (2) 

b-6j-8 

wherein T(i.j}^ is a constant. PO.D^ is the sum of all pixels squared in the input image, and P(i.j}T(l.j) is the multiplication 
and summation of all pixels in the input image with the corresponding pixel in the tenplate image. 

It can be clearly seen that the algorithm can be divided into several steps, some of which can be executed in par- 
f5 allel. 

(1) Ck)mpute P{i,j)^m. 

(2) Compute P(i j)T{i.D. 

(3) Add T(i,j)^ to (2), T(i.D2 is a constant can so can be calculated off-line 
20 (4) Subtract (2) from (3) 

This reduces the calculation to four basic steps. 

The device 31 15 has parts 31 151-31 156. The parts 3151 1 and 31 153 are 16 x 16 8 bit con^elators, part 31 151 per- 
forming the P(ij)T(i,j) and part 31 153 performing 31153 performing the PCij*)^. The part 31152 is an 256 x 8 SRAM 

2S which is used as a lookup table to convert the input image pixel values to their squared values before correlation. This 
is required so that numerical accuracy is maintained throughout the correlation process. 

The results from the conrelation are inputted to the find min device 3116 where the minimum value and the position 
of the minimum value are found. A circuit of the device 31 1 6 is shown in Fig. 2(1^. It can be seen from the diagram that 
the device 31 16 is similar to the device 3108 and operation of both circuits is identical. 

30 It is envisaged that the eye position detection stage can be expanded so that multiple eye templates can be corre- 
lated and the best correlation value found. The implementation of this type of system will be clear to those skilled in the 
art. 

The final output from the eye position detection system are two pixel locations, (LEx,LEy) and (REx,REy), which 
define the location of left and right eye in the input image. 

95 The devices 31 17 and 31 18 (shown in Rg. 2(1)}. for right eye detection and left eye detection, make up the pupil 
and eyebrow position detection stage. The purpose of the pupil and eyebrow position detection stage is to use the eye 
coordinates. (LEx.LEy) and (REx.REy), obtained from the device 3116. together with the normalized image from the 
device 31 06, to find the positions of the eye pupil and eyebrow for both the left and right eyes. 

The device 3117 for right eye detection is shown in circuit form in Rg. 2(1). The device 311 7 is comprised of parts 

40 31 1 71 -31 1 75. The first part 31171. known as crop picture, is used to obtain a region of interest, using the right eye coor- 
dinates (REx.REy) as the central pixel. This sub-image is then outputted to the part 31 1 72. known as X Projection which 
performs an X projection on the sub-image. The circuit to implement the part 31 1 72 is shown in Rg. 2(n). The function- 
ing of the part 31 172 is identical to that of the device 3107. 

The data from the device 31172 Is passed to the device 31173 for smoothing where the X projection data is 

45 smoothed from the uppermost row of sub-image to the lowest. A drcuit which implements the device 31 173 is shown 
in Rg. 2(m). The prindple behind this circuit is that the serial input stream is averaged over four pixel values and the 
output is the averaged pixel stream. In order to average, the pixels are stored in REGISTERS with the outputs being fed 
to adders. The result from the adders is then outputted to a SHIFTER, which shifts the result right by two places, cor- 
responding to a divide by 4. The next pixel is then inputted to the circuit and stored in the first REGISTER. In parallel 

so the previous stored data is shifted along the REGISTER chain. The new average is then computed and outputted. This 
process is repeated until all X projected data has been smoothed. 

The averaged X projected data from the part 31 1 73 is then passed to the device 31 1 74. The purpose of this device 
is to search the averaged X projected data from the uppermost row value to the lowermost row value and find the max- 
imum pedk in the data. This peak corresponds to the y coordinate location of the eyebrow. A circuit which implements 

55 the part 31 1 74 is shown in Rg. 2(o). The principle of this circuit is to locate the position where the (N-i-l)^ data value is 
less than the data value, since this shows that a peak has been encountered. The (N-f 1)*^ and data values are 
provided by the REGISTERS, whose outputs are fed to a COMPARATOR which compares the values and outputs a 
SET signal to an RS flip flop when the (N+1)^ data value is less than the N^*^ data value. The RS flip flop is used to issue 
a load signal to two REGISTERS which store the pixel value and the location at which is occurred. This data represents 
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20 



to y location of the eyebrow, RBy. The RBx location it is assumed to be the same as REx. Hence the location of the brow 
is now located at (RBx,RBy). 

The purpose of the find minimum part 31 1 75 is to find the position of the pupil. This is done by finding the minimum 
value in the normalised image. The circuit which is used to implement the part 31 175 is shown in Rg. 2(p). Since the 
operation of this circuit is identical to that of the devices 3108 and 31 13 it is not explained. The output of this circuit is 
the coordinate of the right eye pupil, (RPx.RPy). 

The part 31 18 is similar to device 31 1 7, but differs in that it uses the coordinates of the left eye, (LEx,LEy), to crop 
the image. 

Referring now to Rgs. 3(a) and 3(b), the following functions are carried out by the various parts of the post-proces- 
sor 50 to convert the received facial part position parameters to the facial characteristics. 

fnttiallsation Parameters 



Up Separation.y = 
Lip Separation j( = 
Average Eye.x = 
Average Eye.y o 
Average PupiLx = 
Average Pupil.y o 
Offset Eye.x = 
Offset Eye.y = 
Offset Left Brow = 
Offset Right Brow = 



25 Online Parameters 



MIy-Mhy 

MIx-Mhx 

(LEx+REx)/2 

(LEy+REx)/2 

(LPx+RPx)/2 

(LPy+RPy)/2 

Average Eye.x-Average Pupil.x 
Average Eye.y-Average Pupil.y 
LBy - Average Eye.y 
RBy - Average Eye.y 



30 



35 



40 



45 



so 



Average Eye.x = 
Average Eye.y s 
Average Pupil.x a 
Average Pupil.y s 
Face Centre.x = 
Face Centre.y = 
Mouth Centre.x = 
Mouth Centre.y » 
Mouth Rel.x s 
Mouth Rel.y = 
Eye Centre.x = 
Eye Centre.y = 
Rotate.z « 
Rotate.y = 
Rstgigjc = 
L^ft Eye.x « 
LefLEy&y» 
L^ft Brow « 
Right Brow o 

Mouth Qpenpess.x : 

Mouth Qpennessy: 



(LEx+REx)/2 

(LEy+REx)/2 

(LPx + RPx)/2 

(LPy+RPy)/2 

(Slx+Shx)/2 

(Sly+Shy)/2 

(Mlx-»-Mhx)/2 

(Mly^.Mhy)/2 

(Face Centrex - Mouth Centre,x)/BOX WIDTH 
(Mouth Centre.y - Face Gentre.y)/BOX HEIGHT 
(Face Centre.x - Average Eye.x)/BOX WIDTH 
(Face Centre.y - Average Eye.y)/BOX HEIGHT 
(Average Eye.x - Mouth Centre.x)/10 
CONST1 * Mouth Rel.y 
CONSTC * Mouth Rel.x 

Right Eye.x = (Average Eye.x - Average Pupil.x - Offset Eye.x) * 10/4 

Right Eye.y = (Average Pupil.y - Offset Eye.y)* 10/12 

Left Brow.y - Average Eye.y - Offset Left Brow 

Right Brow.y - Average Eye.y - Offset Right Brow 

(Mix - Mhx - Up Separation.x)/BOX WIDTH 

(MIy - Mhy - Up Separation.y)/BOX HEIGHT 



The last nine variables (underlined) which are calculated by use of the online parameters constitute a face vector. 

The face vector is transmitted from the post-processor 50 fbr use in the desired application. Because of the com- 
prehensive nature of this signal, it has a wide range of uses. 

It will be appreciated that the invention provides an apparatus which is very simple because of the nature and rout- 
ing of input image data. Further, the output signals are very comprehensive in their content - including location data in 
terms of regions of pixels rather than edges. This data can additionally be used to provide very useful facial character- 
istic data signals for down-stream processing. Such processing may include capture of expressions, sign language 
communication, videophone communication, computer animation or facial substitution in video images both In single 
frame or real time video acquisition. 
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An image processing method comprising the steps of receiving an input image signal and generating a feature 
extraction tracking signal, characterised in that :- 

the input image signal is in H,S,V format; 

a facial area location signal (Six. Sly, Shx. Shy) is generated by passing at least part (H. S) of the input image 
signal through a band pass filter (31 03) and analyzing the output of the filter; 

a mouth location signal (Mix, Mly, Mhx. Mhy) is generated by passing at least part (V, S) of the input image sig- 
nal through a band pass filter (3102) and analysing the output of he fOter within the foetal pixel area according 
to the facial area location signal; 

IS eye location signals (LEx, UEy, REx. REy) are generated by processing at least part of the input image signal 

within the facial pixel area according to the fadal area location signal; and 

the facial area location, moutii location and eye location signals are outputted as output tracking signals. 

20 2. A method as claimed in claim 1 . wherein only two of the H.S, V input image components are used for generation of 
the ^cial area location signal. 

3. A method as claimed in claim 2, wherein the H and S components are passed tiirough tiie band pass filter (3103) 
fbr generation of the facial area location signal. 

25 

4. A method as claimed in any preceding claim wherein only two of the H,S.V input image components are used for 
generation of the moutii location signal. 

5. A method as claimed in claim 4. wherein the S and V components are passed through ttie band pass filter (3102) 
30 for generation of the mouth area location signal. 

6. A method as claimed in any preceding claim, wherein the band pass filter output signals are analyzed by mapping 
the output data over the pixel area and generating a projection in a mapping axis and analyzing said projection. 

d5 7. A method as claimed in claim 6. wherein two projections are generated, one fbr each axis in a two-dimensional 
pixel area plane. 

8. A method as claimed in any preceding claim, wherein each band pass filter comprises a look-up table (LUT) con- 
taining filter indicators which are generated off-ilna 

40 

9. A method as claimed in any of claims 6 to 8. wherein the step of analyzing the filter output signals comprises the 
further steps of determining maximum limits in the pixel area fbr a feature and generating a tminding box according 
to said limits. 

45 10. A method as claimed in any preceding claim, wherein the image processing fbr generation of the eye area location 
signals comprises the steps of correlation v^h terrplates. 

11. A mettiod as claimed in claim 10, wherein the image signal is normalised before correlation. 

so 1 2. A method as claimed in claims 1 0 or 1 1 . wherein the V oonponent only of the input Image signal is used for gen- 
eration of the eye location signals. 

1 3. A method as claimed in any preceding daim, wherein the tracking signals which are generated are post-processed 
to generate a facial characteristic signal representing both location and positional characteristic data, said signal 

55 being generated by passing the tracking signals through logic devices. 

14. An image processing apparatus comprising :- 

means (3101) fbr receiving an input image signal in H.S.V format; 
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a facial area band pass filter (3103); 

means for passing at least part of the input image signal through the facial area band pass filter and analyzing 
the output of the filter to generate a fecial area location signal; 

5 

a mouth location band pass filter (3102); 

means for passing at least part of the input image signal through the mouth location band pass filter (31 02) and 
(3110-31 14) for analyzing the output of the filter within the face pixel area according to the fecial area location 
10 signal; 

processing means (3104-3106. 3115-3117) for processing at least part of the input image signal within the 
fecial pixel area according to the fecial area location signal to generate eye location signals; and 

IS means for outputting said fedal area location, mouth location, and eye location signals as output tracking sig- 

nals. 

15. An apparatus as claimed in claim 14 wherein only tiie H and S components of the input image signal are passed 
through the facial area band pass filter (3103). 

20 

16. An apparatus as claimed in claims 14 or 15 wherein only the S and V components of the input Image signal are 
passed through tiie mouth location band pass filter (3102). 

1 7. An apparatus as claimed in any of claims 1 4 to 1 6 further comprising post-processing logic devices (50) comprising 
25 means for receiving the tracking signals and generating a fadal characteristic signal representing botii location and 

positional characteristic data. 
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